APRICOT: an integrated computational pipeline for the sequence-based identification and characterization of RNA-binding proteins

نویسندگان

  • Malvika Sharan
  • Konrad U. Förstner
  • Ana Eulalio
  • Jörg Vogel
چکیده

RNA-binding proteins (RBPs) have been established as core components of several post-transcriptional gene regulation mechanisms. Experimental techniques such as cross-linking and co-immunoprecipitation have enabled the identification of RBPs, RNA-binding domains (RBDs) and their regulatory roles in the eukaryotic species such as human and yeast in large-scale. In contrast, our knowledge of the number and potential diversity of RBPs in bacteria is poorer due to the technical challenges associated with the existing global screening approaches. We introduce APRICOT, a computational pipeline for the sequence-based identification and characterization of proteins using RBDs known from experimental studies. The pipeline identifies functional motifs in protein sequences using position-specific scoring matrices and Hidden Markov Models of the functional domains and statistically scores them based on a series of sequence-based features. Subsequently, APRICOT identifies putative RBPs and characterizes them by several biological properties. Here we demonstrate the application and adaptability of the pipeline on large-scale protein sets, including the bacterial proteome of Escherichia coli. APRICOT showed better performance on various datasets compared to other existing tools for the sequence-based prediction of RBPs by achieving an average sensitivity and specificity of 0.90 and 0.91 respectively. The command-line tool and its documentation are available at https://pypi.python.org/pypi/bio-apricot.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Computational Identification of Micro RNAs and Their Transcript Target(s) in Field Mustard (Brassica rapa L.)

Background: Micro RNAs (miRNAs) are a pivotal part of non-protein-coding endogenous small RNA molecules that regulate the genes involved in plant growth and development, and respond to biotic and abiotic environmental stresses posttranscriptionally.Objective: In the present study, we report the results of a systemic search for identifi cation of new miRNAs in B. rapa using homology-based ...

متن کامل

Identification of RNA-binding sites in artemin based on docking energy landscapes and molecular dynamics simulation

There are questions concerning the functions of artemin, an abundant stress protein found in Artemiaduring embryo development. It has been reported that artemin binds RNA at high temperatures in vitro, suggesting an RNA protective role. In this study, we investigated the possibility of the presence of RNA-bindingsites and their structural properties in artemin, using docking energy ...

متن کامل

Biochemical characterization of PE_PGRS61 family protein of Mycobacterium tuberculosis H37Rv reveals the binding ability to fibronectin

Objective(s): The periodic binding of protein expressed by Mycobacterium tuberculosis H37Rv with the host cell receptor molecules i.e. fibronectin (Fn) is gaining significance because of its adhesive properties.  The genome sequencing of M. tuberculosis H37Rv revealed that the proline-glutamic (PE) proteins contain polymorphic GC-rich repetitive sequences (PGRS) which have clinical importance i...

متن کامل

Identification of Gyrodactylus gurleyi in Carassius auratus using morphometric and molecular characterization

BACKGROUNDS: Gyrodactylus is a small monogenean ectoparasite that lives on the skin and fins of most of the world's fish species. Gyrodactylus appears to be one of the most prevalent parasites found in ornamental fish, especially in Cyprinids. Goldfish (Carassius auratus) are a popular ornamental fish that are highly contaminated by Gyrodcatylus. OBJECTIVES: The present study is aimed to identi...

متن کامل

Cloning and Characterization of cbhII Gene fromTrichoderma parceramosum and Its Expressionin Pichia pastoris

The genomic and cDNA clones encoding cellobiohydrolase II (CBHII) have been isolated and sequenced from a native Iranian isolate of Trichoderma parceramosum, a high cellulolytic enzymes producer isolate. This represents the first report of cbhII gene from this organism. Comparison of genomic and cDNA sequences indicates this gene contains three short introns and also an open reading frame codin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 45  شماره 

صفحات  -

تاریخ انتشار 2017